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ABSTRACT 


The objective of this work is to develop methodologies to detect, and report the non- 
compliant images with respect to indian space research organisation (ISRO) recruit- 
ment requirements. The recruitment software hosted at U. R. rao satellite centre 
(URSC) is responsible for handling recruitment activities of ISRO. Large number of 
online applications are received for each post advertised. In many cases, it is observed 
that the candidates are uploading either wrong or non-compliant images of the required 
documents. By non-compliant images, we mean images which do not have faces or 
there is not enough clarity in the faces present in the images uploaded. In this work, 
we attempt to address two specific problems namely: 1) To recognise image uploaded 
to recruitment portal contains a human face or not. This is addressed using a face 
detection algorithm. 2) To check whether images uploaded by two or more applica- 
tions are same or not. This is achieved by using machine learning (ML) algorithms to 
generate similarity score between two images, and then identify the duplicate images. 
Screening of valid applications becomes very challenging as the verification of such 
images using a manual process is very time consuming and requires large human ef- 
forts. Hence, we propose novel ML techniques to determine duplicate and non-face 
images in the applications received by the recruitment portal. 


This is an open access article under the CC BY-SA license. 
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1. INTRODUCTION 


Computers and information group (CIG) of U. R. rao satellite centre (URSC) is involved in develop- 
ment, customization, and management of the software used for recruitment activities of indian space research 
organisation (ISRO) [1], [2]. Recruitment is the process of sourcing, screening, and selecting the candidates for 
a vacancy within an organization. Each year several advertisements are released, and few lakhs of applications 
are received per year. Screening and processing of such a huge volume of applications manually will not only 
require large human efforts but also might lead to inconsistent results. Automation is the only solution to reduce 
the burden from such repetitive tasks. Based on the expertise gained over the years, certain things which can be 
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generalized as set of rules are already automated. In addition to these rule based automations, in this work, we 
would to explore certain image processing techniques using machine learning (ML) algorithms for increased 
automation of recruitment activities. 

In this work, we attempt to address two specific problems namely : 1) To recognise image uploaded 
to recruitment portal contains a human face or not. We propose to solve this problem using Haar cascade 
classifiers based face detection algorithm. 2) To check whether images uploaded by two or more applications 
are same or not. We propose to solve this problem using image similarity detection algorithm based on certain 
ML techniques. The face detection algorithms work based on the facial features such as spacing of the eyes, 
bridge of the nose, the contour of the lips, ears, and chin. Face detection has numerous applications in security 
(authentication and authorization), defense, marketing, healthcare, hospitality, face detection, lip reading, and 
auto-focus. 

The rest of the paper is organized is being as: Section 2 provides brief literature survey. Section 3 
describes the development and evaluation of face detection system for screening of e-recruitment applications. 
Section 4 discusses the development of similarity detection system. Section 5 summary and future work change 
to conclusion. 


2. RELATED WORK 

The research in face detection and recognition is very actively pursued over last several decades. 
There have been significant number of works reported in this area. Only very few notable works among them 
are described here. Some of the literature surveys on the face detection and recognition is being as. In 2003, 
Lewis et al. [3] have presented a detailed review on the psychological evidence about the process of face 
detection in brain. It is shown that with the use of face recognition systems, it is possible to identify or check 
the identity of individuals in a matter of few seconds. 

In 2009, Jafri et al. [4] have presented an overview of various face recognition techniques. The 
benefits and limitations of different face recognition algorithms are examined. The applications and difficulties 
involved in each of these techniques are described. 

In 2010, Degtyarev et al. [5] have proposed set of parameters for face detection algorithms to evaluate 
their qualities and perform objective comparisons, and to determine the current state of the art face detection al- 
gorithm. They have compared seven face detection algorithms and the results of their comparison are reported. 
In 2010, Zhang et al. [6] have surveyed the recent advances in face detection for previous decade with an hope 
see better algorithms developed in future to solve the problem of face detection. They have surveyed various 
techniques according to the way features are extracted and type of learning algorithms employed. 

In 2013, Roomi et al. [7] have presented a survey of various face recognition works reported in the 
past decade, mainly focusing on the ones which were not reported in other similar surveys. Further, they have 
categorized them into meaningful approaches such as appearance based, feature based, and soft computing 
based. A comparative study of merits and demerits of these approaches is also presented. 

In 2015, Farfade et al. [8] have proposed a deep dense face detector method for multi-view face 
detection. The proposed method does not require pose/landmark annotation and is able to detect faces in a 
wide range of orientations using a single model based on deep convolutional neural networks with minimal 
complexity. In 2018, Hua et al. [9] have presented joint optimal solution for addressing face representation 
and matching problems in face verification task using a unified framework. A second-order face representa- 
tion method for face pair and a unified face verification framework, in which the feature extractors and the 
subsequent binary classification model design are made to select flexibly, is presented. 

In 2020, Kortli et al. [10] have presented a survey of some of the well-known theories and algorithms 
used in face recognition. A detailed comparison in terms of robustness, accuracy, complexity, and discrimi- 
nation, of all these different techniques is reported. An overview of the most commonly used databases for 
both supervised and unsupervised learning is given. Frischholz has consolidated all useful information on 
face detection and recognition problems in [11]. It provides appropriate links to various softwares, datasets, 
algorithms, selected publications, and other resources related to face detection and recognition problems. 

There are few studies exploring the use of artificial intelligence (AI) techniques for recruitment appli- 
cations such as screening the candidates, establishment of relationships, taking unbiased decisions and sched- 
ules, and applicant’s social media communications. Some of the works exploring AI techniques for recruitment 
activities is being as. In 2018, Upadhyay et al. [12] have reviewed the applications of AI tools in the hiring 
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process and its practical implications. They have highlighted the strategic shift in recruitment industry caused 
due to the adoption of AI in the recruitment process. It is found that the application of AI for managing the 
recruitment process is leading to efficiency as well as qualitative gains for both clients and candidates. 

In 2019, Albert [13] has investigated the use of AI tools such as chatbots, screening software, and 
task automation, in the recruitment and selection of candidates by the companies. On a similar lines, Weinert 
et al. [14] have also examined the use of AI techniques for selection and assessment of human resources by 
the companies, and various challenges involved it. In 2019, Nawaz [15] has explored the application of face 
detection for recruitment process. He has demonstrated the use of principal component analysis techniques to 
detect duplicate faces and thereby enabling the detection of duplicate applications. 

In 2019, Nawaz [16] has examined the use of AI techniques on the recruitment effectiveness of the 
software companies. The study uses a data-set containing a structured questionnaire from 100 human resource 
professionals. In 2019, Esch et al. [17] have worked on how the potential candidates regard the use of AI in the 
recruitment process and is there any influence on the likelihood of applying for a job by potential candidates 
due to use of AI in recruitment. They show that the novelty factor of using AI in the recruitment process, 
mediates and further positively influences job application likelihood. 


3. FACE DETECTION SYSTEM FOR SCREENING OF APPLICATIONS 

Figure 1 shows the block diagram of complete face detection system implemented by us. A photo 
uploaded by an applicant will be fetched and fed as input to face detection algorithm. If a face is detected by 
the face detection algorithm, then the application will be accepted. If a face is not detected by the face detection 
algorithm then that photo will be added to the list of images that have to be manually inspected. The list of such 
images is made available on the screening portal with a provision for screening personnel either to accept or 
reject such applications. The screening personnel will manually inspect and accept the application if the photo 
is proper or else reject the application. 
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Figure 1. Block diagram of face detection system 


3.1. Face detection algorithm 

Face detection is an image processing technique for identifying human faces in images and videos. It 
is the psychological process with which humans locate and attend to faces in a visual scene [3]. Face detection 
is a Specific case of object detection, where face becomes the object to be detected. The task of object detection 
is to find the locations and sizes of all objects in an image that belong to a given class. In this work, we have 
worked on face detection using a haar cascade classifiers. The face detection using Haar feature-based cascade 
classifiers is a machine learning based approach where a cascade function is trained using large number of 
positive and negative images [18]. The trained cascade function is used to detect similar objects in other images. 
Haar features are like convoluctional kernel, where each feature is a single value obtained by subtracting sum 
of pixels under white rectangle from sum of pixels under black rectangle [19]. The haar features are computed 
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by finding the sum of pixels under white and black rectangles. The calculation of sum of pixels is simplified 
using integral images. Large number of features are computed using all possibles sizes and locations of each 
kernel. The four Haar features namely are: a) Edge feature, b) Line feature, c) Four-rectangle feature. Figure 2 
shows the various types of Haar features for face. 

The edge features seems to focus on the property that the region of the eyes is often darker than the 
region of the nose and cheeks, while the line features focus on the property that the eyes are darker than the 
bridge of the nose. These features are detected only when the window is applied on the face region, and the 
windows applying on cheeks or any other part of the image become irrelevant. Each and every feature is applied 
on all the training images. For each feature, it finds the best threshold which will classify the faces into positive 
and negative classes. The features with minimum error rate are selected. These features indicate that they 


are the features that best classifies the face and non-face images. We have used the pre-trained Haar cascade 
E —_ (b) Line Features 


classifier model provided by Opencv [19] library. 
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Figure 2. Illustration of various types of Haar features for face detection (Courtesy: Figure taken from [20]) 


3.2. Evaluation of face detection algorithm 

We have ran the face detection algorithm for some of our selected recruitment advertisements. Table 
1 shows the evaluation statistics of face detection system. From the column 4, it can be seen that some of the 
valid photos are also detected as invalid photos. Hence, we can not blindly use the output of face detection 
algorithm as it is. The list of suspected invalid photos have to be inspected manually and actual invalid photos 
have to be determined. This makes the face detection system semi-automatic. Although this system can not 
replace the human intervention completely, but it drastically reduces the human effort involved in screening of 
recruitment applications. Sixth column in Table 1 shows the % reduction in the manual effort for screening 
applications. The average reduction is 98.55%, which indicates only 1.45% of the manual effort required for 
performing the screening using face detection system. This is a very drastic reduction in the manual effort. 
For example, in case of serial no. J (second row), the use of face detection system has reduced the number 
of applications to be screened from 4145 to 79. Likewise the reduction is from 30008 to 304 for serial no. 6 
(seventh row). The last column provides the face detection accuracy. The average face detection accuracy is 
found to be 76.41%, which is reasonably a good value. This approach would not only reduce the costs involved 
in recruitment activities but also promises more consistent results, and requires very less time compared to 
humans. This approach will not give any chance to miss out any of the applications with valid photos as any 
rejection will always have to be done by humans. 

Figure 3 shows few suspected invalid photos detected by face detection algorithm. It is very surprising 
to see various different kinds of photos uploaded by the candidates along-with their applications. Invalid photos 
vary from animations, signatures, marks cards, snapshot of mobiles, whatsapp images, some random image 
taken from internet, and some random photo clicked using mobiles. 

Due to data confidentiality issues, we have shown only the generic images in Figure 3. However, there 
are several variety of images such as certificates, grade cards, photo images, (which are of restricted nature and 
can not be published) that were also classified as invalid images by the algorithm. Few such examples include 
1) faces in the image are completely covered by hairs such that only one side of the face is visible, 2) photos 
that are captured using the head covered with a cap or a turban such that part of the forehead is not visible, 
3) photos are taken such that part of the forehead, cheeks and chin are not visible, and 4) photos with goggles 
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covering their eyes. Hence, in some of the cases the face detection algorithm has failed to detect a human face 
due to following reasons. 1) If the photo is taken by wearing a spectacle. In this case, the algorithm fails to 
detect the facial features such as spacing of the eyes, and the contrasting line features present at the eyebrows 
and eyeball covers are lost, 2) If an head cap or turban is used such that certain part of forehead and eyebrows 
are covered, and complete face is not visible. In this case also algorithm fails to extract all the facial features, 
3) If the face is rotated such that only one side of the face is visible, and other side of the face is either partially 
or completely invisible, then algorithm will not able capture all the required features, 4) If the resolution of the 
image is too low, so that considered window size exceeds the photo size. 


Table 1. Evaluation statistics of face detection system 


SI No. Total No. of Suspected In- No. of Correct No. of Incorrect % Reduction in Face Detec- 
Applications valid Photos Photos Detected Photos Detected the Manual ef- tion Accuracy 
Screened Count as Invalid Photos as Invalid Photos fort (%) 
1 4145 79 18 61 98.09 Tell 
2 3706 62 12 50 98.32 80.64 
3 45059 414 156 258 99.08 62.31 
4 25700 237 62 175 99.07 73.83 
5 10280 106 19 87 98.96 82.07 
6 30008 304 51 253 98.98 83.22 
7 8787 144 33 111 98.36 77.08 
8 832 15 3 12 98.19 80 
9 1727 2J 3 24 98.43 88.88 
10 869 17 a 10 98.04 58.82 
Average % - - - - 98.55 76.41 
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Figure 3. Sample invalid photos detected by face detection system 
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4. SIMILARITY DETECTION SYSTEM FOR PHOTOS 

Two important techniques for comparison of images are 1) Comparison of histograms and 2) Template 
matching. An histogram is a graphical representation of the value distribution of a digital image. The histogram 
intersection algorithm was proposed by Swain and Ballard in [21]. The histogram intersection does not require 
the accurate separation of the object from its background and it is robust to occluding objects in the foreground. 
Histograms are translation invariant, but they change slowly under different view angles, scales and in presence 
of occlusions [22]. Histogram comparison is one of the simplest, fastest method to find the similarities in 
the images. Here the assumption is that a particular type of picture will have a particular color in abundance. 
For example, a picture of a forest will have a lot of green color, a picture of a banana will have lot of yellow 
color. So, if two pictures with forests are being compared then we will get some similarity between the two 
histograms, as both of them have lot of green color. Further details on comparison of histograms can be found 
in [21], [22]. 

Template matching is a technique in digital image processing for finding small parts of an image which 
match a template image. A basic method of template matching uses an image template, tailored to a specific 
feature of the search image which we want to detect. The cross correlation output will be highest at places 
where the image structure matches the mask structure, where large image values get multiplied by large mask 
values. As all possible positions of the template with respect to the search image are considered, the position 
with the highest score is the best position [23], [24]. It is known work well with identical images with same 
size and orientation, to which our case mostly fits in. Further details on template matching can be found in 
[23], [24]. 

In this study, we have computed the similarity score using the combination of both the approaches 
- comparison of histograms and template matching. Python’s OpenCV library is used for implementation. 
Since, both of these methods alone did not produce better results, we have combined them using a weighted 
combination method. We have assigned a lower weightage of 0.1 to histogram comparison method as it was 
found to be less accurate than template matchingmethod. And, template matching method was assigned a 
higher weightage of 0.9. Two images are compared and a similarity score is returned based on the comparison. 
The similarity score indicates how similar the two images being compared are”. For example, a similarity 
score of 100% would indicate that the same image is being compared, and a similarity score of 0% would 
indicate that two images are totally different. 

Each image in an advertisement will be compared with all other images. This would result in a time 
complexity of O(n?). After comparison of images, the algorithm would return a similarity score ranging from 
0% to 100%. In this study, we have considered only the cases with similarity score of 100%. The comparison 
of images that have returned a similarity score of 100% would be treated as similar images. This algorithm is 
computationally very intensive and requires huge computing resources. For one instance of comparison of pair 
of images on a Desktop PC (8 GB RAM, Intel 17-6700 CPU @ 3.40GHz with 8 cores, No Graphics card) took 
around one minute. 

Although the proposed technique is working reasonably well and has produced some of the promising 
results, due to data confidentiality issues, we are restricted to not to publish any of the images that are detected 
by the similarity detection system. We have found that, there are number of instances where the same candidate 
has applied multiple times to the same post advertised using the same photo. In one such case, we found that a 
candidate has applied 5 times to the same post using the same photo. 


5. SUMMARY AND FUTURE WORK 

In this work, we have explored two ML techniques-face detection and similarity detection-for automat- 
ing the screening of recruitment applications. It is found that the use of face detection system has drastically 
reduced (by 98.5%) the manual effort required for screening the recruitment applications. The detailed analy- 
sis on when and why the face detection fails 1s carried out. The similarity detection system was developed to 
compare two images and determine their similarity score. Although, the similarity detection system is working 
reasonably well but it is very resource hungry and requires large computing infrastructure. 

In future, various state-of-the-art deep learning algorithms such as convolutional neural networks 
(CNN) for face detection [25], [26] can be explored to detect and eliminate non-face images. Instead of using 
the libraries provided by OpenCV, the face detection models can be trained using custom datasets of face and 
non-face images, and then these models can be used for performing face detection. One can also explore the 
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possiblity of development of hybrid techniques (which combine outputs of multiple face detection algorithms) 
for face detection. The feature mapping techniques can be explored for building similarity detection systems 
for similarity detection of face images. Sparse coding based image similarity detection [27] techniques can be 
explored for building similarity detection systems. 
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